Using Numba in a Python Backend App
Summary
This tutorial is about how to significantly speed up your Python back end app using Numba package.
What is Numba?
Numba is an open source just-in-time compiler that translates a subset of Python and Numpy code into fast machine code.
Review the Numba documentation in order to install Numba.
Why is Numba used?
Numba gives you the power to speed up your applications with high performance functions written directly in Python. With a few simple annotations, array-oriented and math-heavy Python code can be just-in-time optimized to performance similar as C, C++ and Fortran, without having to switch languages or Python interpreters. Numba’s main features are:
- On-the-fly code generation
- Native code generation for the CPU (default) and GPU hardware
- Integration with the Python scientific software stack (Numpy)
When is Number used?
Numba works best on code that uses Numpy arrays, functions, broadcasting, and loops.
When Numba should not be used?
Pandas is not understood by Numba. Your code will not benefit from Numba if you use pandas.DataFrame.
How to measure the performance of Numba?
Numba has to compile your function before it executes the machine code version of your function. This takes time. However, once the compilation has taken place Numba caches the machine code version of your function. If it is called again, it can reuse the cached version instead of having to compile again.
How is Numba used?
- Wrap the code you want to speed up into a function.
- Add Numba’s JIT decorator @jit to your function. Set "nopython" mode for best performance.
Example: The count_triples function below uses nested loops and a Numpy array. By adding @jit decorator we make the interpreter to compile the function code to the machine code that then is used in lambda_handler.
import time
import numpy as np
from numba import jit
from corva import Api, Cache, Logger, ScheduledNaturalTimeEvent, scheduled
@jit(nopython=True)
def count_triples(arr, n):
cnt = 0
for i in range(0, n):
for j in range(i + 1, n):
for k in range(j + 1, n):
if arr[k] < arr[i] < arr[j]:
cnt += 1
return cnt
@scheduled
def lambda_handler(event: ScheduledNaturalTimeEvent, api: Api, cache: Cache):
arr = np.random.randint(1, 101, 200)
n = len(arr)
# DO NOT MEASURE THE FIRST RUN PERFORMANCE. COMPILATION TIME IS INCLUDED IN THE EXECUTION TIME!
count_triples(arr, n)
# NOW THE FUNCTION IS COMPILED, TIME IT EXECUTING FROM CACHE
start = time.time()
count_triples(arr, n)
end = time.time()
Logger.info("JIT function duration= %s" % (end - start))
# MEASURE THE CODE PERFORMANCE WITHOUT USING COMPILED MACHINE CODE
start = time.time()
count_triples.py_func(arr, n)
end = time.time()
Logger.info("Regular function duration = %s" % (end - start))
- Add Numba to requiments.txt
corva-sdk==1.5.2
pytest==7.1.2
numba==0.50.1
- Deploy the app. See Getting Started: Section 4. Upload and Publish
- Check the logs.
Example of Numba logs